Deep Syntax Annotation of the Sequoia French Treebank

نویسندگان

  • Marie Candito
  • Guy Perrier
  • Bruno Guillaume
  • Corentin Ribeyre
  • Karën Fort
  • Djamé Seddah
  • Éric Villemonte de la Clergerie
چکیده

We define a deep syntactic representation scheme for French, which abstracts away from surface syntactic variation and diathesis alternations, and describe the annotation of deep syntactic representations on top of the surface dependency trees of the Sequoia corpus. The resulting deep-annotated corpus, named DEEP-SEQUOIA, is freely available, and hopefully useful for corpus linguistics studies and for training deep analyzers to prepare semantic analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation scheme for deep dependency syntax of French (Un schéma d'annotation en dépendances syntaxiques profondes pour le français) [in French]

We describe in this article an annotation scheme for deep dependency syntax, built from the surface annotation scheme of the Sequoia corpus, abstracting away from it and expressing the grammatical relations between content words. When these grammatical relations take part into verbal diatheses, we consider the diatheses as resulting from redistributions from the canonical diathesis, which we re...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Corpus Annotation within the French FrameNet: a Domain-by-domain Methodology

This paper reports on the development of a French FrameNet, within the ASFALDA project. While the first phase of the project focused on the development of a French set of frames and corresponding lexicon (Candito et al., 2014), this paper concentrates on the subsequent corpus annotation phase, which focused on four notional domains (commercial transactions, cognitive stances, causality and verb...

متن کامل

Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie

This article presents the methods, results, and precision of the syntactic annotation process of the Rhapsodie Treebank of spoken French. The Rhapsodie Treebank is an 33,000 word corpus annotated for prosody and syntax, licensed in its entirety under Creative Commons. The syntactic annotation contains two levels: a macro-syntactic level, containing a segmentation into illocutionary units (inclu...

متن کامل

An exploitation of the Prague Dependency Treebank: a valency case

The Prague Dependency Treebank (PDT) is a manually annotated part of the Czech National Corpus (Čermák 1997). Its size is approx. 90,000 sentences, i.e. 1.5 million words (tokens). Three layers of annotation (Hajič 2002) are used: the morphological layer, where lemmas and tags are annotated, the analytical layer, which roughly corresponds to the surface (shallow) syntactic structure of the sent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014